Sparse, Dense, and Attentional Representations for Text Retrieval
نویسندگان
چکیده
Abstract Dual encoders perform retrieval by encoding documents and queries into dense low-dimensional vectors, scoring each document its inner product with the query. We investigate capacity of this architecture relative to sparse bag-of-words models attentional neural networks. Using both theoretical empirical analysis, we establish connections between dimension, margin gold lower-ranked documents, length, suggesting limitations in fixed-length encodings support precise long documents. Building on these insights, propose a simple model that combines efficiency dual some expressiveness more costly architectures, explore sparse-dense hybrids capitalize precision retrieval. These outperform strong alternatives large-scale
منابع مشابه
Sparse representations for text categorization
Sparse representations (SRs) are often used to characterize a test signal using few support training examples, and allow the number of supports to be adapted to the specific signal being categorized. Given the good performance of SRs compared to other classifiers for both image classification and phonetic classification, in this paper, we extended the use of SRs for text classification, a metho...
متن کاملBoosting Sparse Representations for Image Retrieval
In this thesis, we developed and implemented a method for creating sparse representations of real images for image retrieval. Feature selection occurs both offline by choosing highly selective features and online via “boosting”. A tree of repeated filtering with simple kernels is used to compute the initial set of features. A lower dimensional representation is then found by selecting the most ...
متن کاملSubspace Clustering Reloaded: Sparse vs. Dense Representations
State-of-the-art methods for learning unions of subspaces from a collection of data leverage sparsity to form representations of each vector in the dataset with respect to the remaining vectors in the dataset. The resulting sparse representations can be used to form a subspace affinity matrix to cluster the data into their respective subspaces. While sparsity-driven methods for subspace cluster...
متن کاملDense Mapping for Range Sensors: Efficient Algorithms and Sparse Representations
This paper focuses on efficient occupancy grid building based on wavelet occupancy grids, a new sparse grid representation and on a new update algorithm for range sensors. The update algorithm takes advantage of the natural multiscale properties of the wavelet expansion to update only parts of the environement that are modified by the sensor measurements and at the proper scale. The sparse wave...
متن کاملCombining Text Vector Representations for Information Retrieval
This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Random Indexing; and Holographic Reduced Representations (HRRs). Random indexing uses co-occurrence information among words to generate semantic context vectors that are the sum of randomly generated term identity vectors. HRRs are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2021
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00369